Protocol to use TopNet for gene regulatory network modeling using gene expression data from perturbation experiments

Summary Inference of gene regulatory networks from gene perturbation experiments is the most reliable approach for investigating interdependence between genes. Here, we describe the initial gene perturbations, expression measurements, and preparation steps, followed by network modeling using TopNet. Summarization and visualization of the estimated networks and optional genetic testing of dependencies revealed by the network model are demonstrated. While developed for gene perturbation experiments, TopNet models data in which nodes are both perturbed and measured. For complete details on the use and execution of this protocol, please refer to McMurray et al. (2021).


SUMMARY
Inference of gene regulatory networks from gene perturbation experiments is the most reliable approach for investigating interdependence between genes. Here, we describe the initial gene perturbations, expression measurements, and preparation steps, followed by network modeling using TopNet. Summarization and visualization of the estimated networks and optional genetic testing of dependencies revealed by the network model are demonstrated. While developed for gene perturbation experiments, TopNet models data in which nodes are both perturbed and measured. For complete details on the use and execution of this protocol, please refer to McMurray et al. (2021).

BEFORE YOU BEGIN
We have divided this section into two parts. First, we describe the cell culture. Next, we describe the computational setup.

Cell culture
The protocol below describes the specific steps for using young adult mouse colon (YAMC) cells derived from the Immorto-mouse (also known as the H-2Kb/tsA58 transgenic mouse) (Jat et al., 1991;Whitehead et al., 1993). These cells express temperature-sensitive simian virus 40 large T (tsA58 mutant) under the control of an interferon g-inducible promoter and have been used by our group to enable study of oncogene cooperation in colon cells. However, we have also adapted this protocol for use in human colorectal cancer cells lines DLD-1, HT-29, SW480, SW620, human prostate cancer cell line PC3 and human breast cell lines MCF10A and MDA-MB-231 cells, using appropriate culture conditions for each of those cell lines.

Institutional permissions
A material transfer agreement (MTA) is required to obtain young adult mouse colon cells. We obtained these cells under an MTA from R. Whitehead and A.W. Burgess, who originally derived the a. $9% fetal bovine serum (FBS). b. 13 ITS-A. c. 2.5 mg/mL gentamycin. d. 5 U/mL interferon g. e. Prepared media can be stored at 4 C for up to one month.
Note: Interferon g stocks are made at 250 U/mL in 13 PBS containing 0.1% BSA. The resulting solution is filter sterilized and stored at À20 C for up to 6 months.
Optional: Thaw frozen aliquots of YAMC cells or derivatives for routine culture at 33 C 6. Remove cell aliquots from liquid nitrogen storage.
a. Loosen tube caps to prevent explosion from escaping nitrogen, but do not remove caps completely as this would compromise cell sterility. i. YAMC cells and derivatives are frozen in 33 C media supplemented with 10% DMSO.
ii. Cells are never frozen from 39 C culture conditions. iii. Aim to freeze 10 6 -5 3 10 6 cells in 1 mL of freezing media per vial, allowing these to be thawed onto 10 cm TC dishes. 7. Float tubes in 37 C water bath for 2 min. Remove tubes from water and pat dry. 8. Remove entire volume of cells and media and add to 10 mL media on a collagen-coated dish. 9. Place plated cells into 33 C tissue culture incubator with 5% CO2. After cells are allowed to adhere to dishes, between 16 -24 h after plating, aspirate media and replace with fresh 33 C media to remove residual DMSO from thawed cells.
Note: YAMC cells and derivatives are frozen in 33 C media supplemented with 10% DMSO. Cells are never frozen from 39 C culture conditions. Aim to freeze 10 6 -5 3 10 6 cells per vial, allowing these to be thawed onto 10 cm TC dishes.
Perform routine maintenance of cultured cells 10. Every 3-5 days, YAMC parental cells are split at a ratio of 1:2 to 1:4 and mp53/Ras cells are split at a ratio of 1:8 to 1:12. a. Determination of when to split is done by visual inspection of cell density on plates. YAMC cells do best when kept between 40 -90% confluent. i. Plating too sparsely may cause YAMC cells to die.
ii. Plating too densely may cause YAMC cells to undergo growth arrest due to contact inhibition. b. By virtue of having undergone malignant transformation, mp53/Ras cells are not as sensitive to cell density on dishes. Moreover, they divide faster than YAMC cells and thus are better handled by splitting at higher ratios. 11. To split cells, trypsinize using 13 trypsin-EDTA (Invitrogen, #25300) at room temperature for 5-10 min. Dilute with 33 C culture media to inhibit trypsin, then plate on unused, collagen coated TC dishes at ratios described in step 10.
Note: Cells can also be counted and plated at defined densities as when temperature switching by plating the cells in media for growth at 39 C (described in step-by-step method details, steps 12 and 13).
12. Thaw and culture FNX-E viral packaging cells (also known as Phoenix-ECO cells, available from ATCC) in Dulbecco's Modified Eagle Medium (DMEM) supplemented with 10% FBS, 50 mg/mL kanamycin and 2.5 mg/mL gentamicin. a. These cells are derivatives of HEK293T and contain stably integrated plasmids to allow mouse-infectious retroviral vector production. b. Moreover, they are amenable to transfection with lipid-based transfection reagents or with calcium phosphate. The latter is our preferred protocol, outlined below. 13. Prepare DNA for each perturbation that will be performed in a given experiment. a. A simple experiment would include two plasmids, each used to transfect and infect an independent plate of cells. For example, you would use a pBabe construct containing cDNA for a gene of interest and an empty vector control (i.e., pBabe with no insert). b. Plasmid DNA encoding retroviral constructs, such as pBabe vectors encoding selected cDNA or pSuper-retro vectors expressing selected shRNA, should be obtained by carrying out the manufacturer's protocol (QIAGEN) for maxi-prep of recombinant E. coli carrying the appropriate plasmids. c. Once prepped, plasmid DNA can be stored at À20 C and utilized for multiple experiments for up to one year. 14. To generate virus for infection of YAMC or mp53/Ras cells, plate FNX-E on a 10 cm dish at a density of 2.5 3 10 6 cells in 5 mL of media, at least 6 and no more than 24 h prior to transfection. a. Set up individual plates for each population of virus to be generated (i.e., make separate plates for vector controls and for each perturbation vector that will be generated to remove potentially contaminating FNX-E cells. b. Virus can also be aliquoted and frozen at À80 C to store. No special measures are necessary in this case, as the process of freezing will kill any FNX-E cells that may be in the supernatants.

Computational setup
We now describe the steps necessary to install the software packages used by the TopNet algorithm.
Install the ternarynet R/Bioconductor package used by TopNet for network modeling TaqMan probe detecting murine Death-associated protein kinase 1 (Dapk1)

Applied Biosystems Mm00460518_m1
TaqMan probe detecting murine Zinc finger protein 385A (Zfp385) Applied Biosystems Mm00600201_m1 Protocol Alternatives: As above for 33 C media.
Alternatives: As above for 33 C media.
CRITICAL: Adjust pH to 7.1-7.3. The pH of this solution is the key to transfection efficiency. Filter sterilize.
Alternatives: Lipid-mediated transfection methods may be used as an alternative to calcium phosphate transfection. b. Polybrene can be stored at 4 C for up to 6 months or can be frozen at -20 C for longer periods. 3. One day after plating cells (18-24 h), aspirate media from plates and pipet collected supernatants containing viruses on to each plate to be infected. a. Retrovirus-containing supernatants should have a 5 mL volume if generated as described in ''before you begin'' steps 20 and 21 above. b. Use only one type of virus per plate of cells. 4. For mock infection, replace media with 5 mL of fresh 33 C media. 5. Add 8 mg of polybrene per mL of supernatant to each dish being infected (i.e., dilute 1003 or 10003 stock to 13 final concentration in volume of supernatant on each dish). 6. Incubate for 1.5-3 h at 33 C in CO 2 incubator. 7. Aspirate supernatant and replace with second aliquot of supernatant for virus containing the same perturbagen. In other words, the same dish should receive multiple rounds of infection with empty vector or virus harboring a single cDNA or shRNA insert. a. Generally, we do 2 rounds of infection but YAMC and mp53/Ras cells can tolerate up to 6 rounds of infection before experiencing toxic effects from exposure to polybrene. 8. Following all desired rounds of infection, replace supernatant with fresh 33 C media. 9. Allow cells to recover for 48-72 h prior to beginning drug selection. 10. For drug selection of YAMC or mp53/Ras cells, i.e., elimination of uninfected cells in the population, add the appropriate selective agent to 33 C media at concentrations listed below and depending on the resistance marker present in the retroviral vector used. a. Puromycin: 5 mg/mL media; Selection takes 2-4 days. b. Hygromycin: 200 mg /mL media; Selection takes 4-7 days. c. Bleomycin: 100-400 mg /mL media; Selection takes up to 2 weeks. d. Neomycin/Geneticin: 100-400 mg /mL media; Selection takes up to 2 weeks. 11. Once perturbed populations are derived, they should be maintained in media with the appropriate selective agent during routine maintenance / cell splitting.

STEP-BY-STEP METHOD DETAILS
CRITICAL: Selected cell populations should be used for RNA isolation, tumor formation studies and other experimentation within two weeks of derivation. Long-term culture of perturbed populations can allow for drift in the population and altered cell behavior over time in culture.
Note: For experiments described in McMurray et al., Cell Reports, 2021, we freshly transduced and selected polyclonal populations for each biological replicate and each perturbation described in the paper.
Note: For double / combined perturbation, derive polyclonal population harboring one perturbation and then repeat above steps for second perturbagen. Be sure to use perturbagens with distinct selectable markers to enable selection of pure populations of perturbed cells harboring both perturbations. When selecting with multiple selective agents, use half the dosage recommended above to avoid non-specific cell killing.
Extract RNA and measure gene expression in perturbed mp53/Ras cell populations should be determined whether growth in the presence or absence of serum is more appropriate, as serum can have substantial impact on observed gene expression patterns. Following serum starvation, cells are harvested for RNA isolation and reverse transcription to generate cDNA that is used for TaqMan quantitative PCR assays.
12. Trypsinize and count perturbed cell populations to be used experimentally. Include an empty vector control in each experiment performed. For mp53/Ras cells, plate 250,000 cells per 10-cm dish. 13. Plate cells onto fresh collagen-coated dishes with 39 C media, which excludes interferon gamma and any eukaryotic-selective agents from the media (i.e., should be free of puromycin or similar). The media should contain anti-microbials, here kanamycin and gentamycin. 14. Allow cells to adhere and grow in 39 C CO 2 incubator for 48 h. 15. Serum-starve cells by aspirating media from each dish and replacing with fresh 39 C media without serum. 16. Allow cells to grow for an additional 24 h in 39 C CO 2 incubator. 17. Harvest cells following trypsinization. Inhibit trypsin activity using 39 C media with serum, then immediately collect each cell population into 15 mL conical tubes (one per perturbation) and pellet cells by centrifugation at 500 g for 5 min at 4 C. 18. Aspirate media and trypsin from cell pellets. Re-suspend cells in 5 mL 13 PBS (can be room temperature or ice-cold), then pellet again by centrifugation. 19. Aspirate PBS from cell pellets. Use immediately for RNA extraction OR snap freeze and store at À20 C. a. Cell pellets can be stored at À20 C for up to one month prior to RNA extraction. 20. Isolate RNA by following manufacturer's protocol for the QIAGEN RNeasy Mini kit with On-Column DNase Digestion. a. Isolated RNA can be stored at À20 C for two years or longer if handled in an RNase-free manner and not exposed to repeated freeze-thaw cycles. 21. Prepare reverse transcription reactions: a. Denature 10 mg of each RNA sample to be used by incubation on a heat block or in a water bath at 70 C for 10 min. b. Plunge samples into ice to stop denaturation. Then add the components listed in the 26. Load mixture into each of 8 ports on the array at 100 mL per port.
a. Each individual sample of cDNA sample was processed on a separate card.
Note: As described below in step 32 and further sections on TopNet modeling, four biological replicates of each perturbation were measured.
Note: As noted in step 13 of the before you begin section, we considered a replicate to be an independently derived population of perturbed cells. Each replicate of a given perturbation had an empty vector control population of cells that was derived in parallel with the perturbed cell populations.
27. Seal arrays with a TaqMan Low-Density Array Sealer (Applied Biosystems) to prevent crosscontamination. 28. Run real-time amplifications on an ABI Prism 7900HT Sequence Detection System (Applied Biosystems) with a TaqMan Low Density Array Upgrade.
29. After real-time amplification is complete, obtain threshold cycle (C t ) values via Sequence Detection Software (SDS, Applied Biosystems) following manufacturer's instructions. a. SDS is graphical user interface software designed for use by wet-lab biologists. For this protocol, each sample was analyzed individually using this software package.

Reverse transcription reaction steps
Step

Normalization and missing data imputation
Timing: $1 h 32. Select one or more control features to normalize the data. In this example, we use Becn1 as a control because it appears to track the lower quantile of the distribution of expression across the controls reasonably well ( Figure 1A) and has relatively low variability across the control samples compared to the other genes ( Figure 1B). This suggests that normalizing to this housekeeping gene may perform well in these data. a. Examine variability in the distribution of expression across control samples. While some variability might be ascribed to the different control vectors, it is unlikely that the effect would be of this magnitude and consistency. Moreover, note that while there is substantial variability in the location of the expression distribution across the empty vector control samples, the range (e.g., IQR) remains relatively constant (except for one sample with the lowest overall expression that also had noticeably higher variability).
Note: The proportion of non-detects general increases for larger Ct values (lower gene expression). However, if this is not the case, an alternative imputation procedure, such as mean imputation, should be used. Additionally, if there are a large number of non-detects spread randomly across all measured genes, this may indicate poor sample quality.
Additionally, the model_prep function converts the data to the qPCRset object format needed to impute the non-detects and returns normalization factors. Specifically, the sampleType field of the sample annotation contains a unique description of each experiment: which gene(s) were perturbed and the direction of perturbation. This is used to define replicates for use in the imputation. b. Treat non-detects as non-random missing data and impute using the R/Bioconductor package nondetects (McCall et al., 2014). This replaces missing values (non-detects) with an imputed values based on an estimated missing data mechanism as well as the expression values seen in replicate experiments.
Note: Attempting to measure lowly expressed genes will result in a higher number of non-detects. As the number of non-detects increases, the imputation procedure becomes less reliable. Therefore, while there isn't a universal threshold for the applicability of the non-detects imputation, there is a trade-off between the ability to measure lowly expressed genes and the reliability of the imputation. Often this can be addressed by increasing the number of replicates such that the number of observed values for each gene is sufficiently large.
Note: This function takes a while to run.
c. Normalize the data. Here, we normalize the data to the house-keeping gene, Becn1.
If we were normalizing to multiple control genes, we would first calculate the mean expression of the control genes then proceed as above.
Note: After normalization, we have lost the cycle threshold (Ct) interpretation but retained the inverse relationship between expression values (on the Ct scale) and the amount of transcript in the sample. Therefore, we consider the negative DCt value as our measure of normalized expression.

OPEN ACCESS
In the previous sections we have dealt with non-detects (missing values) and normalized the data. We now turn out attention to assessing which genes are up-or down-regulated in response to each perturbation.

Calculate
DDCt values that quantify the change in expression from controls for each perturbation. a. Examine whether a given perturbed sample is more similar to the control sample from the same batch then to control samples from other batches by running the check_matched_ controls function. If there is a batch-effect, it is advantageous to compare each perturbed sample to the control sample from the same batch.
b. In almost all cases, the control sample from the same batch is the most similar to the perturbed sample (Figure 2). This suggests that there is a difference between batches and motivates the calculation of paired DDCt values as our measure of normalized change in expression in response to each perturbation. If there were no difference between batches, the distance from a given perturbed sample to its matched control (points in Figure 2) would be randomly distributed within the set of pairwise differences. An eyeball test is often sufficient to assess this; however, a permutation-based statistical test could also be used. If a batch effect does not exist, then we can simply compare each perturbed sample to the average of all the control samples to compute unpaired DDCt values. c. Calculate paired DDCt values by computing the difference in expression between each perturbed sample and its corresponding control sample from the same batch by running calculate_ddCt().
d. One final check of the non-detect imputation procedure from before is to examine the distribution of residuals stratified by the presence of imputed non-detect values. These can exist in either the perturbed sample, the control sample, or both samples.

check_matched_controls(crgdataNorm)
ddCt <-calculate_ddCt(crgdataNorm) Figure 2. Matched control samples capture potential batch effects For each perturbed sample (x-axis), the distribution of Euclidean distances from that the vector of Ct values for that perturbed sample to each control sample is shown. The matched control sample for each perturbed sample is highlighted in red. In general, a perturbed sample is most similar to its corresponding control sample, suggesting that matched control and perturbed samples are similarly affected by potential batch effects.

OPEN ACCESS
Here, we see what one would expect: a median of zero and roughly equal spread when there are no non-detects, a median slightly below zero when there is a non-detect in only the perturbed sample, and a median slightly above zero when there is a non-detect in only the control sample ( Figure 3). e. Calculate approximate z-scores for each perturbation after removing any control genes.
Here, we remove the control gene Becn1.
f. Calculate the probability of up-/ down-regulation in response to each perturbation by fitting a uniform / normal / uniform mixture model.
g. Filter genes that were measured but are not perturbed in any experiment. While these genes are useful in modeling the missing data mechanism for non-detect imputation and estimating probabilities of up-/ down-regulation, they will not be used in the subsequent network modeling and can be removed at this point. Here, we also filter several samples that do not represent the type of perturbation being modeled, a single perturbation back to normal expression levels.
h. Format the probabilities for input to the network model fitting algorithm. probabilities <-calculate_probs(zscores) perts <-gsub(":.+","",colnames(probabilities)) ind <-which(!rownames(probabilities) %in% perts) probabilities <-probabilities [-ind, -c(1,9,12,13,20,23)] -c (1,9,12,13,20,23)  Optional: Connectivity graph analysis. Either the z-scores or probabilities could be thresholded to produce a connectivity graph, in which nodes represent genes and directed edges denote that perturbation of the parent gene results in a change in expression of the child gene. Here, we create a connectivity graph based on the probabilities by thresholding the probabilities at an absolute value of 0.5. In other words, if the probability that perturbation of gene A results in a change in expression of gene B exceeds 0.5 then an edge from gene A to gene B is included in the connectivity graph.

Perform network modeling
Timing: $5 days This step estimates the network of interactions.
35. Network modeling. Use a ternary network model that accounts for the dynamic nature of gene regulatory networks and facilitates the evaluation of uncertainty to model a gene regulatory network. Specifically, use a parallel tempering algorithm (Swendsen and Wang, 1986) to search the model space for networks that produce attractors that are most similar to the observed steady state data. Pseudocode for the network modeling algorithm is supplied in Methods S1, while a reproducible workflow of the analyses in this paper is included as the vignette of the crgnet package.
Note: Unlike previous approaches, here we have incorporated uncertainty in the differential expression estimates via probabilities of up-/ down-regulation. This allows the network model to give more weight to data points with higher certainty. Additionally, the ability to produce non-integer network scores eased transitions between network models and significantly decreased computational time.
a. Here, we show example code to fit a network using 1,000,000,000 cycles in parallel across 20 processors with temperatures ranging from 0.001 to 1. These parameters should be chosen such that either the network reaches a score of zero (indicating a fit that perfectly explains the observed data) or until additional cycles do not produce a reduction in the network score. The code used to produce these network fits is shown below: "Not Tumor Inhibitory"), title="Node Color", fill=c("yellow","grey")) legend("topright", c("Up-regulation", "Down-regulation"), title="Edge Color", col=c("red","blue"), lty=1, lwd=5) library("ternarynet") data("crgnet_scores") results <-parallelFit(experiment_set=crgnet_scores,  (112358) ) Note: The computational time required to generate these network models is substantial: each independent network fit presented in McMurray et al. (2021) took approximately 12 h to run in parallel on 20 compute nodes. However, less than 1 GB of RAM was sufficient to fit these network models.

Create network summaries and visualization
Timing: $1 h ($1 month with optional steps included) This step prepares the data for input to the network estimation algorithm.
36. Summary statistics can be computed by calculating the proportion of networks in which a given feature or features are present. One can also examine the transition functions, attractors, and trajectories all stored in the fits object. One of the most common ways to visualize a network model is to present the topology. Here we calculate proportion of networks in which a given gene is a parent of another given gene.
This information can be exported to Cytoscape or other network visualization software to create a graphical representation of these results.
Optional: One question of interest is whether it is significant that one can obtain a low scoring network model. This can be examined by permuting the network input data. For each gene, permute its response to all of the experiments while retaining the number of experiments to which each gene responds. In other words, the number of parents in a connectivity graph remains constant but which other genes are parents' changes. Then fit a network model to the permuted data using the same parameters as above and repeat this process to generate multiple permuted fits. Here, we load precomputed permuted fits from the crgnet package.
Optional: To examine whether we could obtain similar fits with a simpler network model, one can vary the in-degree and compare model fits. As an example, we reran the network modeling algorithm with the max_parents reduced from 4 to 3 and also considered a more complex model by increasing the max_parents parameter to 5. We ran both models on permuted data as well as the real data.

OPEN ACCESS
STAR Protocols 3, 101737, December 16, 2022 Note: Figure 6 illustrates that increasing the in-degree cap from 3 to 4 results in a sizeable reduction in the model score; however, increasing the in-degree cap to 5 produces only a modest improvement (the in-degree 4 network models already do quite well). Regardless of the in-degree cap, better scores were achieved using the real data (as expected). The separation between real and permuted scores is greatest for an in-degree cap of 4. This lends further support to the choice of a maximum in-degree of four for these data.
Optional: Comparison between network models generated using different in-degree thresholds. Here, we demonstrate the effect of the maximum in-degree on the resulting network topology. Note that a larger in-degree threshold will produce a network with more edges. Of primary interest is whether the high confidence edges are retained for varying in-degrees.

OPEN ACCESS
Testing tumor formation capacity of cells with two genetic perturbations based on features of the network model

Timing: 5-6 weeks
Here, we describe the process of measuring tumor formation in perturbed mp53/Ras cell populations and linking these measurements to features of the network model. This is done by perturbation of multiple target genes selected from among the network nodes in mp53/Ras cells. Subsequently, the perturbed cell populations are implanted into allogeneic, immune compromised mice and tumor growth is measured by monitoring tumor size over time. We perform these studies in CD-1 nude mice (Crl:CD-1-Foxn1nu, Charles River Laboratories, purchased at 6-8 weeks of age), but a number of other immune compromised mouse strains could be used (e.g., NOD/SCID animals).  The network modeling algorithm was run with in-degree limits of 3, 4, and 5 on both real and permuted data. Larger indegree limits produce generally better scores, and the real data produce better scores than the permuted data across all three in-degree limits.
43. Use tumor diameter data to calculate tumor volume using the standard formula for volume of a sphere (volume=(4/3)pr 3 ). 44. Examine and quantify the association between predicted gene interactions and phenotypic outcomes (here tumor size).

EXPECTED OUTCOMES
TopNet provides an integrated pipeline to process data from perturbation experiments, generate network models consistent with the observed data, and summarize and visualize these networks. The outputs of this pipeline are the network topology and set of transition functions that define the network, as well as several parameters that can be used to assess the quality of the network model. The output of TopNet can also be readily visualized in Cytoscape and other network visualization tools. As demonstrated in (McMurray et al., 2021), summaries of the network topology can be analyzed in concert with phenotypic variables of interest (e.g., tumor growth) to pinpoint biologically relevant aspects of the network architecture.

LIMITATIONS Generating cell populations with gene perturbations via retroviral vectors
The pBabe retroviral packaging system cannot effectively transduce gene expression inserts larger than about 7,000 bp, due to the retroviral capsid particle size and thus the amount of nucleic acid that can be packaged.
The pSuper-retro vector system delivers shRNA-expressing constructs to the target cell population. Despite following specific shRNA design guidelines, not every shRNA insert will effectively reduce the expression of the targeted gene. Thus, it is necessary to test multiple shRNAs that target distinct sequences in a given gene to identify those that successfully knock down the gene's expression.
shRNA is known to have potential off-target effects. Thus, it is advisable to independently derive cell populations using constructs that target distinct sequences within a given gene of interest to ensure that observed phenotypes are consistent between constructs and target sequences, and thus unlikely to result from off-target effects of a single vector.

Control of perturbagen expression
When perturbing gene expression using these systems, there is little control over the amount of perturbation that is achieved. For experiments described in McMurray et al. (2021), every derivation of each perturbed mp53/Ras cell population was compared to empty vector-infected mp53/Ras control populations derived in parallel. Expression of the perturbed gene was also compared to parental YAMC expression levels to discern whether the gene's expression was re-set in perturbed mp53/Ras cells to YAMC levels, our target for these experiments. As described previously (McMurray et al., 2008(McMurray et al., , 2021, cell populations that did not meet this standard were either excluded from further studies or results were interpreted in the context of known over-expression.

Flexibility of TopNet modeling
TopNet requires data in which each node in the network is both measurable and perturbable. In McMurray et al. (2021), the nodes of the network were genes measured via qPCR and perturbed using lentiviral vectors. While applicable to more than just gene expression data, TopNet cannot currently handle data in which the perturbations are not also the nodes of the network, e.g., chemical compound perturbation.

Scalability of TopNet modeling
The size of the network model space scales rapidly with both the number of nodes and the maximum number of parents per node (the in-degree limit). This means with current computational resources, networks larger than 1,000 nodes are likely infeasible to estimate. However, advances in computational power and improvements in search algorithms will likely reduce this limitation over time.

TROUBLESHOOTING
Problem 1 Inability to derive a polyclonal cell population following retroviral infection and drug selection.
This problem can arise for several reasons that all lead to the same problem -insufficient expression of the resistance gene encoded by the vector, such that transduced cells succumb to drug selection. This can happen due to 1) poor transduction of cells with the retroviral supernatants, 2) insufficient viral titers to generate an efficient infection, 3) sensitivity of the particular cell line / cell type to the drug of choice or 4) failing to wait for cells to undergo at least one cell division before beginning drug selection (retroviruses require cell division for integration into the host cell genome).

Potential solution
The solution depends on the source of the problem.
Poor transduction (1) may be improved by increasing the amount of polybrene mixed into the retroviral supernatants during infection. This is limited by overt toxicity of this additive, so test a range of dosages on the cells of interest to identify a maximum concentration. Insufficient viral titers may be overcome by improving transfection efficiency to produce more retrovirus and/or by using more supernatant or concentrating supernatants to provide more infectious units to the target cells. Cell sensitivity to the selection agent can be examined by testing a range of drug dosages on uninfected cells. Choose the minimum dose to achieve the desired effect -either cytotoxicity or cytostatic effects. Alternatively, inserts can be cloned into vectors with different drug selectable markers to allow for use of a selection agent with less baseline toxicity to the target cells. If it is necessary to move rapidly from infection to selection, use of lentiviral vectors may be preferable. Lentiviruses are able to infect non-dividing cells and thus generally have higher infection efficiencies than retroviral vector systems.

Problem 2
The network returns a null model after a short runtime. This could arise in two situations: (1) the experimental data itself does not contain any connections, or (2) the input data to the network fitting algorithm was incorrectly formatted.

Potential solution
First, examine a connectivity graph to assess whether any nodes in the network response to perturbation of other nodes. If the connectivity graph contains no edges, then it is possible that the perturbations that were performed did not produce changes in any of the measured nodes. However, if the connectivity graph contains edges, the input data to the network fitting algorithm was likely incorrectly formatted. An example of correctly formatted data is available in the crgnet package and helper functions to produce correctly formatted data are available in the ternarynet package.

Problem 3
The network model never runs to completion.
This problem may arise when the number of nodes in the network or the number of possible parents for each node (the in-degree limit) are too large for the available computation resources used to run the network fitting algorithm.

Potential solution
This problem can be solved by: (1) reducing the size of the network model (either the number of nodes or the in-degree limit) or (2) increasing the available computational resources, e.g., running the code on a high performance computing cluster or in the cloud. Additionally, if running the network fitting algorithm using the serial simulated annealing search algorithm, it could help to instead use the parallel replica exchange Monte Carlo search algorithm. Figure 7 shows the normalized network scores (zero corresponds to a perfect fit) for varying network sizes (64, 128, and 256 nodes), vary in-degree limits of 1-5, and varying number of replicas using in the replica exchange Monte Carlo search algorithm.

Problem 4
The network model returns a score far from zero. This problem may arise for several reasons: (1) the input scores are on a different scale, (2) the network search algorithm was not allowed to run for a sufficient length of time, (3) the data contain seemingly contradictory responses to the perturbation experiments.

OPEN ACCESS
Potential solution This will likely take some investigation. First, to check if possibility (1) is correct, one can calculate the minimum possible score from the input data. If this is far from zero, an error may have occurred in generating the scores. Alternatively, if using a custom scoring function that simply produces scores on a different scale, one could consider shifting or rescaling the input scores. Second, to check if possibility (2) is the issue, simply run the network model for more cycles and see if the scores improve. One can examine the scores as a function of cycles (see Figure 7 for examples) to determine whether the scores are still decreasing when the maximum number of cycles is reached. Finally, possibility (3) is the most challenging to assess but is a likely cause if possibilities (1) & (2) are ruled out. In this situation, the best approach is to increase the in-degree limit of the network model.

Problem 5
The network model returns few if any high confidence edges.
This problem may arise when there is insufficient information to resolve uncertainty in the network model space.

Potential solution
The most direct solution would be to perform additional perturbation experiments to reduce modeling uncertainty. Alternatively, one could restrict the model space algorithmically by decreasing the maximum in-degree allowed.

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Matthew McCall (matthew_mccall@urmc.rochester.edu).

Materials availability
Plasmids generated in this study are available upon request.
Primer sequences and TaqMan probe sets are tabulated in the key resources table.
There are restrictions to the availability of YAMC cells due to materials transfer agreement.
Data and code availability All qPCR data are available in the supplemental information of McMurray et al., Cell Reports, Dec 2021. Tumor volume data reported in this paper will be shared by the lead contact upon request.
The pseudocode for the network modeling algorithm is supplied in Methods S1, while a reproducible workflow of the analyses in this paper is included as the vignette of the crgnet package.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.